
100 



uc 




uc 


• • • 


uc 




21 



12 



14 



Figure 1 



24 

/ 

DISCOVER 

26 



EXTR ACT 

^28 



ENHANCE 

30 

GROUP 



FIGURE 2 




metadata 

{multimedia URL, page URL, page 
Titfe, page keywords, page 
description} 




72 



Future Validity 
Checks 



300 



Retry 
Extraction 



Spider 



Extraction Agent 

(aka "Cracker") 
Interpretive Extraction 
& 

Database Retrieval 



Validator 

(Annotation agent 1) 

Determines whether a 
multimedia URL is working 



74 



Virtual Domain 
Detector 

(Annotation Agent 2) 

Determines whether a 
multimedia URL is a 
duplicate 



76 

7_ 



Grouper 

(Annotation Agent 3) 

Groups together variants of 
the same multimedia URL 



Metadata Quality 

Improvement 
(Annotation Agent 4) 

Enhances metadata 
Aggregates metadata from 
other sources. 




Full-Text 
Relevancy Ranking 
{Annotation Agent) 

Sorti search results based 
on sematuically distinct 
data fields 



Promoter 

(Annotation Agent 5) 

Normalizes metadata, 
selects and processes 
metadata into Jorm for 
search system 



Validity Changes 



70 



FIGURE 3 



36 



Seed Spider 






Search 






Retrieve Results 






Parse Page 







38 
40 

42 



Queue 



FIGURE 4 



Dequeue and Distribute 



Multimedia 
Specific 

Interpretive 
Metadata 
Extract 



Store In Memory 
(DBMS) 



queue 



46 



48 



50 




FIGURE 5 



60 



Extract Metadata 




62 


Parse And Index Extracted Metadata Into Metadata Fields 




64 


Compare Fields With Known Databases 




65 



Correct/Replace/Add Fields 



Queue FIGURE 6 



Separate Noisy Metadata Into Keywords 




86 



Perform Full-Text Query 




FIGURE 7 



Select And Sort URLs 







102 



Combine URLs Having 
Common Specified 
Attributes Into Bins 



BINNING 



ITERATIVE 
MASKING 



Queue 



FIGURE 8 



112 



Create Mask 



1 



110 



Compare Mask With URL In Bin 



116 



Remove 
Character(s) 



120. 



114 




Collapse Similar 

URLs In Bin 
Into Single URL 



122 




FIGURE 9 



Reorganize Fields In URL 



Analyze First Field To Identify 
Associated Metadata 



142 



146 

\ 




— 


Analyze Next Field 

To Identify 
Associated Metadata 


Y , 

■4 < 






138 



140 



148 



Add Associated 

Metadata To 
Original Metadata 



Queue 



FIGURE 10 



156 



Semantically Sort And 
Categorize Metadata 

(Who, What, When, Where, etc.) 



158 



Technically Weight Each Category 
(Bit Rate, Duration, Fidelity, etc.) 



160 



Calculate Relevancy Score 



FIGURE 11 



